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ABSTRACT 


Today, there is a vast amount of online material for learners. 
However, due to the lack of prerequisite information needed 
to master them, a lot of time is spent in identifying the right 
learning content for mastering these concepts. A system that 
captures underlying prerequisites needed for learning differ- 
ent concepts can help improve the quality of learning and can 
save time for the learners as well. In this work, we propose an 
unsupervised approach, UPreG, for automatically inferring 
prerequisite relationships between different concepts using 
NLP techniques. Our approach involves extracting the con- 
cepts from unstructured texts in MOOC (Massively Open 
Online Courses) course descriptions, measuring semantic re- 
latedness between the concepts and statistically inferring the 
prerequisite relationships between related concepts. We con- 
ducted both qualitative and quantitative studies to validate 
the effectiveness of our proposed approach. As there are no 
ground truth labels for these prerequisite relations, we con- 
ducted a user study for the evaluation of the prerequisite 
relations. We build the concept graph using prerequisite re- 
lations. We demonstrate few examples of the learning maps 
generated from the graph. The learning maps provide pre- 
requisite information and learning paths for different con- 
cepts. 
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1. INTRODUCTION 


In today’s fast-paced world, skill development and a strong 
foundation in fundamental concepts are becoming very cru- 
cial for career growth. MOOCs, offering a wide variety of 
courses online are becoming ubiquitous among many learn- 
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ers interested in acquiring knowledge and becoming compe- 
tent in their field of interest. In this journey, learners need 
to know the order in which they must learn different con- 
cepts to attain a good level of mastery in a specific topic. 
Knowing the prerequisites when learning a topic improves 
the learning experience of learners and is influential to the 
learner’s achievements [20]. Prerequisite concepts define the 
concepts one must know or understand first before attempt- 
ing to learn or understand something new. 


With the increasing amount of educational data available, 
automatic discovery of concept prerequisite relations has be- 
come both an emerging research opportunity and an open 
challenge. There is a growing interest today in researching 
different techniques for automatically inferring the prereq- 
uisite relations between concepts [17][20]. Various solutions 
like curriculum planning [23], learning assistant [10], auto- 
mated reading list generation [9] etc, have been developed 
based on such techniques. 


Prerequisites at the course-level have been manually curated 
by experts and this helps find prerequisite relations between 
the concepts covered within the courses. For example, con- 
cepts in a course on Optimization are prerequisites to con- 
cepts in a course on Deep Learning. An example in this sce- 
nario would be the Gradient Descent algorithm being a pre- 
requisite for understanding the Backpropagation algorithm 
used in Deep Neural Networks. Such relations created man- 
ually will not scale in real-world online applications. Mod- 
ern applications today support learning content from a wide 
variety of domains and cater to learners from multiple edu- 
cational backgrounds. Manual processes for creating prereq- 
uisite relations in such applications are expensive and time- 
consuming. Hence, it is necessary to develop solutions that 
can infer prerequisite relations using automated approaches. 


In this work, we propose an unsupervised approach, UP- 
reG, for automatically inferring prerequisite relationships 
between different concepts using NLP techniques. We built 
a concepts graph capturing the concepts and the prerequisite 
relation between them. Concepts here refer to technologies, 
programming languages, tools, and topics in the Software 
and Computer Science domain. The concepts graph can be 
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Figure 1: Flow Diagram 


leveraged to find the right content for the learner, includ- 
ing the prerequisite content. We conducted both qualitative 
and quantitative studies to validate the effectiveness of our 
proposed approach. As there are no ground truth labels for 
these prerequisite relations, we conducted a user study for 
the evaluation of the prerequisite relations. We observed 
that our approach is effectively able to infer the prerequisite 
relations between concepts. The approach can be extended 
to other domains as well. 


This paper is structured as follows. We present the re- 
lated work in Section 2. In Section 3, we describe our ap- 
proach for concept graph generation followed by the evalu- 
ation methodology and results in Section 4. In section 5, 
we discuss the challenges we encountered while building the 
concepts graph. Finally, Section 6 concludes with future 
work. 


2. RELATED WORK 


Pan et al. [17] propose a learning-based method for latent 
representations of course concepts. They defined various fea- 
tures and trained a classifier that can identify prerequisite 
relations among concepts. Roy et al. [20] proposed PRE- 
REQ, a supervised learning method for inferring concept 
prerequisite relations. The approach uses latent representa- 
tions of concepts obtained from the Pairwise Latent Dirichlet 
Allocation model, and a neural network based architecture. 
They assumed that concept prerequisites are available to 
train supervised model. Yu et al. [24] present an improved 
version PREREQ-S by introducing students’ video watch or- 
der to enhance the video dependency network. They sorted 
the watched videos of each student by time and utilize these 
sequences for replacing the video sequences. They apply 
two simple DNN models, which first encode the embeddings 
of the concept pairs and then train an MLP to classify the 
prerequisite ones. Alzetta et al. [3] applied a deep learning- 
based approach for prerequisite relation extraction between 
educational concepts of a textbook. Lu et al. [13] proposed 
an iterative prerequisite relation learning framework, iPRL, 
which combines a learning based model and recovery based 
model to leverage both concept pair features and dependen- 
cies among learning materials. Liang et al. [12] addressed 
the problem of recovering concept prerequisite relations from 
university course dependencies. They [11] further applied 
active learning to the concept of prerequisite learning prob- 
lem. Pal et al. [16] proposed an approach to find the order of 
concepts from textbooks using the rule-based method. Prior 
work assumes the prerequisite relationship pairs available as 
ground truth and apply supervised learning approach. How- 


ever, acquiring labeled prerequisite pairs is time-consuming 
and expensive. Currently, the major drawback of supervised 
learning is that it doesn’t perform well over cross-domains 
[16]. To the best of our knowledge, we are the first to apply 
unsupervised approach to extract the prerequisite relation- 
ship for software domain. 


3. APPROACH 


In this section, we discuss our approach to build the concepts 
graph. It is a directional graph where nodes represent the 
concepts and the edges between nodes represent the prereq- 
uisite relationship between them. Our approach in building 
the concepts graph involves concept representation, measur- 
ing semantic similarity between the concepts and identifica- 
tion of the prerequisite relationship between them. 


3.1 Concept Representation 

The descriptions of the courses in MOOCs contain rich infor- 
mation about the concepts that will be taught to the learn- 
ers. Many courses do not have annotated course tags to rep- 
resent the concepts taught in the course. It is very expensive 
and time-consuming to manually create course tags from the 
course content [13]. Hence, the concepts must be extracted 
from the course content using text mining approaches. We 
collected course metadata from different MOOCs (Udemy 
and edX) and our internal Learning Management System. 
We apply Latent Dirichlet Allocation (LDA) [4], a topic 
modeling algorithm on each course description to extract 
the concepts. The algorithm generates a topical distribu- 
tion for each course description. To determine the most 
relevant topic that represents the concepts a course covers, 
the topic with highest probability from the distribution is 
selected. After performing several iterations, we found that 
setting k=5 (number of topics to be extracted) gave the best 
results. We extract a total of 9750 unique concepts. 


3.2 Semantic similarity between concepts 

The Semantic similarity measure between concepts gives a 
measure of the semantic relatedness between them. Con- 
cepts that appear in the same context or appear together 
very often have higher semantic similarity scores. Seman- 
tic Similarity computation eliminates noise present in the 
results of the topic modeling algorithm and reduces the pos- 
sibilities of weak relations in the concepts graph. It is also 
useful in prerequisite relation identification as it is likely 
that concepts appearing in similar contexts will have better 
chances of being identified with prerequisite relation. This 
improves the selection of candidates in the concepts graph. 
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Course title: JavaScript: Understanding the Weird Parts 


Course description 


JavaScript is the language that modern developers need to know, and know well. truly 
knowing JavaScript will get you a job, and enable you to build quality web and server 
applications. note: this course includes information on ECMAScript 6 (es6) the next 
version of JavaScript! 


in this course you will gain a deep understanding of JavaScript, learn how JavaScript 
works under the hood, and how that knowledge helps you avoid common pitfalls and 
drastically improve your ability to debug prob! lems. you will find clarity in the parts that 
others, even experienced coders, may find weird, odd, and at times incomprehensible. 
you'll learn the beauty and deceptive power of this language that is at the forefront of 
modern software development today. This course will cover such advanced concepts as 
objects and object literals, function expressions, prototypical inheritance, functional 
programming, scope chains, function constructors (plus new es6 features), immediately 
invoked function expressions (iifes), call, apply, bind, and more. we'll take a deep dive 
into the source code of popular frameworks such as jQuery and underscore to see how 
you can use your understanding of JavaScript to learn (and borrow) from other's good 


or library. What you'll learn in this course will make you a better JavaScript developer, 
and improve your abilities in AngularJS, NodeJS, jQuery, react, ember, mongo dB, and all 
other JavaScript-based technologies!learn to love JavaScript, and code in it well. note: in 
this course you'll also get downloadable source code. you will often be provided with 
‘starter’ code, giving you the base for you to start writing your code, and ‘finished’ code 
to compare your code to ......... 


Objectives: 

"Grasp how JavaScript works and it's fundamental concepts” 

“Write solid, good JavaScript code" 

“Understand advanced concepts such as closures, prototypal inheritance,..." 
"Drastically improve your ability to debug problems in JavaScript.” 

"Avoid common pitfalls and mistakes other JavaScript coders make” 
“Understand the source code of popular JavaScript frameworks" 


“Build your own JavaScript framework or library” 


code. finally, you'll learn the foundations of how to build your own JavaScript framework ; 


Concepts extracted from topic modelling 


JavaScript, jQuery, array, string, dom, 
event, library, ajax, object, loop 


Course labels provided by Udemy 
JavaScript 


Figure 2: Concepts generated for a JavaScript course 
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Figure 3: Stack Overflow questions and tags 


To measure semantic similarity between the concepts we 

compute Pointwise Mutual Information (PMI) and Word2Vec 
cosine similarity scores. The Semantic similarity scores be- 

tween the concepts are computed as the weighted average of 

the two scores. 


3.2.1 Pointwise Mutual Information 

PMI gives a measure of concept association used in informa- 
tion theory [6]. It gives a measure of how likely two concepts 
would occur together when compared to their independent 
occurrences in the data. For computing the PMI of con- 
cept pairs, tags of Stack Overflow questions obtained from 
Stack Overflow data dumps were used. The author posting 
a question on Stack Overflow is asked to provide tags as- 
sociated with the posted question (as shown in Figure 3). 
Tags that appear often together across all the questions are 
likely to be strongly related. Higher the score between the 
two concepts, the more similar they are. We assume that 
the concepts occurring together have some correlation over 
a large set of pairs. To compute the PMI scores, we lever- 


age the Stack Overflow dump consisting of 1,000,000 Stack 
Overflow questions along with their tags [21]. PMI score 
between any two concepts ci and cz is defined as: 

1) 


PMI(c1,c2) = max (0. log [p(c) - ple2)] 
log p(c1, 2) 

Here p(c1, c2) is the probability of co-occurrence of concepts 
c, and cg. It is fraction of Stack Overflow questions in which 
concepts c; and cz co-occur as tags. p(ci) and p(c2) is the 
probability of the independent occurrence of concepts c; and 
c2 as tags across all Stack Overflow questions. The score 
obtained is a normalized score that takes values between 0 
and 1. This ensures PMI and Word2Vec similarity scores 
have the same scale when taking their weighted average. 


3.2.2 Word2Vec Embeddings 


Raw word frequency is not a great measure of association 
between words. One problem is that raw frequency is very 
skewed and not very discriminative. It also does not capture 
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the kinds of contexts shared between the words, which word 
embedding techniques capture [2]. We apply Word2Vec ap- 
proach to learn semantic relatedness between concepts. The 
Word2Vec model is based on the intuition that words which 
are similar in context appear closer in the word embedding 
space. Word2Vec algorithm uses a neural network model 
to learn word associations from a large corpus of text. We 
use skip-gram model [15] to learn word embeddings which 
are low dimensional vector representations of the extracted 
concepts. The neural network is trained using a text corpus 
of course descriptions. We train the skip-gram model for 
generating 300-dimensional word embeddings. Word2Vec 
neural network is trained using the text corpus of the course 
description and objectives. We train Word2Vec model on 
a corpus of 64,150 courses using the Python library gen- 
sim [18] with default parameters. Some of the Word2Vec 
similarity scores between concepts are captured in Table 2. 
Word2Vec(W2V) similarity score between the concepts is 
computed as the cosine similarity between these word em- 
beddings. 
C1°C2 

W2V(erse2) = Tes ical oy 
Here ci and c2 represent 300 dimensional embedding vectors 
of concept ci and co. 


Finally, we compute the similarity score as a weighted av- 
erage of the above two scores. For simplicity, we set the 
weights to 0.5. 


Sim(ci, ce) = wi: W2V (er, c2) + we: PMI (c1,¢2) (8) 


We observed that extracted concepts can appear with dif- 
ferent representations in the Stack Overflow question tags. 
Examples include synonymous pairs such as node.js and 
nodejs, javascript and js, mvc and model view controller, 
etc. To identify such instances, we use the Stack Overflow 
synonym tag api [22] and identify the matching or synony- 
mous concepts in the Stack Overflow tags. We also filter 
out irrelevant concepts having no occurrence or synonyms 
in the Stack Overflow tags. After this process, we end up 
with 5200 concepts. During the computation of probabili- 
ties for PMI scores, we also consider the occurrence count 
of the synonyms. For example, when computing PMI be- 
tween javascript and any other concept, we compute the 
independent and co-occurrence probabilities by counting oc- 
currences of both javascript and js tags in the Stack Overflow 
questions. 


3.3 Identifying Concept Relation 

In this section, we explain the process of identifying the 
prerequisite relationship between different concepts. We 
only consider the concept pairs with high semantic similar- 
ity scores. It is very likely that concept pairs that have very 
low semantic similarity scores are not related at all and we 
can ignore such pairs. For example, it is not useful to learn 
the relationship between Neural Network and PHP which 
are not related and occur in different domains (deep learn- 
ing and web development respectively). However, it would 
be interesting to study the concept pairs Gradient Descent 
and Backpropagation which are algorithms used in machine 
learning and share high semantic similarity scores. Inferring 
the relation that Gradient Descent is a prerequisite of Back- 
propagation and not vice-versa would be useful. To infer 


such relations, we make use of Wikipedia articles. For each 
pair of concepts with high semantic similarity (threshold of 
0.5), we compute the concept relevancy scores. For concepts 
ci and cz, we measure how often the concept ci is referred in 
the Wikipedia article of concept c2 and vice-versa. Based on 
the concept relevancy scores, we can infer the prerequisite 
relation. For example, we know that Java is a prerequisite of 
Spring Boot. So, it is quite possible that in an explanation 
for Spring Boot (a Java Web framework), the concept Java 
would be mentioned more often when compared to the con- 
cept Spring Boot being mentioned in an explanation about 
Java. Algorithm 1 captures the steps to identify the prereq- 
uisite relation between concepts. 


Algorithm 1 Prerequisite relation inference between concepts 


Input: Pair of concepts c; and c; which are strongly related, 
and Wikipedia Knowledge articles. 
Output: Relationship between concept pairs (prerequisite 
relationship) i.e. ci is prerequisite of c2 or vice-versa 
1: Tokenize the knowledge articles for all the concepts 
(C;,), where Cy, is set of concepts 
2: for ordered pair concepts (ci, c;) do 
3: | Compute Concept Relevancy scores (CRS) for ordered 
pairs (c;, cj) as 


TF(c, € D; 

CRS(ci, cj) = bebe 
as wl 

TF (c; € D; 

CRS(cj, ci) = ee 
a) J 


where c; and c; are the concepts for which CRS is 
computed, D; and D; are the wikipedia articles for 
concepts c; and c; respectively, TF'(c; € D;) captures 
the term frequency for concept c; in wikipedia article 
D;, TF (cj; € Dj captures the term frequency for con- 
cept cj in wikipedia article D;, and V(D;, Dj) is the 
normalization term that captures the total vocabulary 
in articles D; and Dj. 

4: If CRS(ci,cj) > CRS(cj,c:), then cj is prerequisite 
of c; and vice-versa 


5: end for 


Table 1: Data collected from online learning platforms 


Platform # Courses | Categories 

Udemy 13601 Software development, 
and design 

Edx 1072 Software development 

Internal LMS | 49202 Software development, 
and design 


3.4 Learning Maps 

The identified prerequisite relation pairs were used to build 
the concept graph. The concept graph has 1325 concepts 
and 1868 edges. We use networkx [8] python library to build 
the concept graph. We pass the adjacency list created from 
the identified concept-prerequisite pairs as an input to the 
library. The edges in the graph have directions from the 
concept node to the prerequisite node. The learning maps 
are built for each concept in the graph using the Depth-first 
search (DFS) algorithm. They are represented as DFS trees 
generated by the algorithm. To visualize the learning maps 
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Table 2: Semantic similarity scores from Word2Vec Embeddings and PMI 


C1 C2 PMI Scores W2V Scores 
Hadoop Hive 0.67 0.43 
MongoDB NoSQL 0.64 0.72 
JavaScript jQuery 0.44 0.68 
JavaScript NodeJS 0.57 0.61 
Neural Network Backpropagation 0.74 0.61 
Blockchain Cryptocurrency 0.34 0.73 
Inheritance Polymorphism 0.54 0.62 
ASP.NET Java 0.17 0.08 
NodeJS Promise 0.54 0.21 
ASP.NET CH 0.20 0.42 
Hadoop Java 0.1 0.23 
SVM Classification 0.62 0.43 
RDBMS SQL 0.19 0.37 
Machine Learning Linear Algebra 0.18 0.49 


we use d3.js force layout [5]. In visualizing the learning maps 
we reverse the edge direction, i.e, from prerequisite node to 
concept node. This is done for the purpose of meaningful 
and easy identification of prerequisites in the learning maps. 
The learning maps for the concepts Blockchain and Java 
Spring framework are shown in Figure 4. The root node 
colored in blue represents the main concept and all nodes 
below the root node colored in orange represent the con- 
cepts that are prerequisites for the main concept. The child 
nodes represent the prerequisite concepts for its parent node 
concept. 


Table 3: Extracted prerequisite relation between concepts 


Ci C2 
Distributed systems Mapreduce 
Probability Logistic Regression 
Encryption Cryptography 
Smart Contract Ethereum 
Backpropagation Neural Networks 
Regression Neural Networks 
JavaScript NodeJS 


4. EVALUATION AND RESULTS 
4.1 Datasets 


We collected metadata about various courses from MOOC 
platforms and our internal Learning Management System 
(LMS) using REST APIs. We fetched data from categories 
relevant to Software Development and Design. The distri- 
bution of the number of courses fetched from different plat- 
forms is shown in Table 1. There are 13,600 courses from 
Udemy, 1,050 courses from edX, and 49,500 courses from 
our LMS in the Software Development and Design category. 
The output from the REST APIs was in JSON format and 
each had a different schema. Hence, we selected MongoDB, 
a NoSQL database to store the retrieved data. 


We apply text pre-processing on course metadata. Specif- 
ically, the course descriptions from Udemy contain HTML 
tags. We parse the HTML tags in course descriptions us- 
ing Beautiful Soup [19]. We remove stopwords and apply 
Lemmatization and Stemming to reduce words to their base 


forms. We also create custom stopwords manually by ana- 
lyzing the topic modeling output. We stored pre-processed 
data in MongoDB for further processing and evaluation. 


4.2 Evaluating extracted concepts 

We apply Latent Dirichlet Allocation (LDA), a topic mod- 
eling algorithm to infer topics from the course descriptions. 
We extract five topics from each course description. Each 
topic is a vector representation that not only indicates the 
words belonging to the topic but also the probability of the 
words belonging to the topic. From the topical distribution 
for the course description, the words from the topic with 
maximum probability were considered and stored against 
each course metadata as tags in the database. Figure 2 
shows the description and the tags obtained for a Javascript 
course in Udemy. 


To evaluate the concepts extracted from the course descrip- 
tion, we apply the Overlap Coefficient to measure the sim- 
ilarity between the concepts extracted from the course de- 
scription and concepts tagged by Udemy. The overlap co- 
efficient, or Szymkiewicz—Simpson coefficient, is a similarity 
measure that measures the overlap between two finite sets 
[1]. It is related to the Jaccard index and is defined as the 
size of the intersection divided by the smaller of the size of 
the two sets. Mathematically, we define the Concept overlap 
coefficient as 


[Xn Y| 


concept_overlap(X,Y) = min( XPD (4) 


where concept_overlap(X, Y) captures the average concept 
overlap between two sets X and Y, X is the concepts ex- 
tracted from topic modeling, Y is the concepts tagged in 
course descriptions of Udemy dataset, and N is the num- 
ber of course descriptions in the dataset. We observed the 
average concept overlap coefficient to be 0.97. This shows 
that the concepts extracted from the topic modeling algo- 
rithm quite well capture the relevant concepts covered in the 
course. Udemy’s course description contains a maximum of 
two concepts tagged. We further analyzed how well our ap- 
proach is able to identify the other concepts from course 
descriptions, not captured in Udemy’s concepts tag. We 
performed a quantitative analysis with 20 Subject Matter 
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Figure 4: Learning maps for Blockchain and Java Spring framework 


Experts (SMEs). The SMEs are having experience ranging 
from 5-10 years and have worked on different technologies in 
IT companies. We randomly sampled 100 courses offered on 
Udemy and provided five courses to each SME along with 
inferred concepts for each course. The SMEs were asked to 
provide their response on whether these inferred concepts 
are relevant for the course or not. We computed the ac- 
curacy considering SME’s responses as true labels. We ob- 
served the accuracy of inferred concepts to be 0.73. 


4.3 Evaluating concept Prerequisite Relations 
There are no ground truth labels available for inferred pre- 
requisite relationships. To assess the effectiveness of pre- 
requisite pairs generated by our approach, we conducted a 
quantitative analysis with 25 SMEs to identify if a concept 
c1 is a prerequisite for another concept cz. We created five 
groups with 5 SMEs in each group. We randomly sampled 
250 concept prerequisite pairs. Each group is provided with 
50 concept prerequisite pairs. We used the Majority voting 
approach to aggregate their responses. We computed the 
accuracy of these pairs considering the SME’s response as 
ground truth labels. We observed the accuracy of concept 
prerequisite pairs to be 0.82. We also measure inter-rater 
agreement amongst experts using Fleiss’ Kappa [14]. Fleiss’ 
Kappa is a statistical measure for assessing the reliability of 
agreement between a fixed number of raters when assigning 
categorical ratings to a number of items or classifying items. 
If the raters are in complete agreement then « =1. If there 
is no agreement among the raters (other than what would 
be expected by chance) then & < 0. We observed « coeffi- 
cient to be 0.74 which indicates a level of strong agreement 
among the raters. We believe some level of disagreement 
may be due to the fact that prerequisites can be subjective 
[12] i.e. it is difficult to get consensus for some pairs of con- 
cepts. Different individuals may have different experiences 
of acquiring knowledge on specific topics, and this may lead 
to different opinions of the prerequisite requirement for a 
topic. Some of the extracted prerequisite relationships are 
shown in Table 3. 


5. CHALLENGES 


Some of the challenges that we faced while building the con- 
cepts graph. 


1. For some concepts extracted from the course descrip- 
tion we had disambiguation issues when checked in 
Wikipedia. For example, Java can refer to a program- 
ming language or an island in Indonesia. To deal with 
this issue, we pass the extracted concepts to google 
search API [7] and fetch the Wikipedia article that is 
ranked higher in the search results. Due to the popu- 
larity of these software concepts, we observe that rele- 
vant results were returned by picking the higher ranked 
Wikipedia article from the search queries. 


2. Our inference of prerequisite relationships is based on 
reference scores computed from Wikipedia articles of 
the concepts. These scores may not always provide ac- 
curate results. It is possible that articles for some of 
the concepts may have high reference scores for con- 
cepts that are derived from it and not vice-versa. 


6. CONCLUSIONS AND FUTURE WORK 


In this paper, we proposed our approach to infer prerequi- 
site relations between concepts and build the concept graph. 
The proposed method does not require manually annotated 
data which was the major drawback of supervised learning 
approaches. We use relevant data sources in different steps 
to incorporate relevant and rich semantic information to in- 
fer prerequisite relations accurately. To validate our results, 
we performed both quantitative and qualitative evaluations. 
The identified concept prerequisite pairs were evaluated by 
subject matter experts. We observed an accuracy of 0.82 
for the inferred prerequisite relations. We built the concept 
graph from the prerequisite relation pairs and demonstrated 
few examples of the learning maps generated from the con- 
cept graph. Learning maps can be used in many applica- 
tions ranging from content-based recommendation systems 
to more sophisticated online tutoring systems etc. As future 
work, we plan to extend our research by creating a personal- 
ized curriculum planner system that captures the concepts 
learners currently know and what they want to learn. By 
leveraging this information, the system will create a person- 
alized learning plan for them using their input information 
and prerequisite relations. Although, our approaches are not 
limited to the software domain, we plan to carry out further 
studies and experimentation to measure the system’s gener- 
alization to other domains. 
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